Introducing linguistic constraints into statistical language modeling
نویسنده
چکیده
Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions, personal pronouns { content words are nouns, verbs, adjectives and adverbs. Based on this class de nition resulting in function and content word markers, a new language model is de ned. A combination of the word-based model with this new model will be introduced. The combined model shows modest improvements both in perplexity results and recognition performance.
منابع مشابه
The Impact of Linguistics on Conceptual Models: Consistency and Understandability
This paper describes a vision in which linguistic knowledge and theories are introduced into conceptual modeling, and it sums up the advantages achieved by this approach. We will show how the extension of conceptual modeling techniques with linguistic theories increases their expressive power, the capability to formalize well-known conceptual aspects, like object roles and constraints, and thei...
متن کاملTowards a unified framework for sub-lexical and supra-lexical linguistic modeling
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...
متن کاملStatistical and Linguistic Clustering for Language Modeling in ASR
In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based...
متن کاملTowards Automatic Identification Of Singing Language In Popular Music Recordings
The automatic analysis of singing from music is an important and challenging issue within the research target of content-based retrieval of music information. As part of this research target, this study presents a first attempt to automatically identify the language sung in a music recording. It is assumed that each language has its own set of constraints that specify which of the basic linguis...
متن کاملA New Framework for Sign Language Recognition based on 3D Handshape Identification and Linguistic Modeling
Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method les...
متن کامل